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A pointing error model is used in the antenna calibration process. Data from space- 
craft or radio star observations are used to determine the parameters in the model How- 
ever, the regression variables are not truly independent , displaying a condition known as 
multicollinearity. Ridge regression , a biased estimation technique , is used to combat the 
multicollinearity problem. Two data sets pertaining to Voyager 1 spacecraft tracking 
(days 105 and 106 of 1987) were analyzed using both linear least squares and ridge 
regression methods. The advantages and limitations of employing the technique are pre- 
sented. The problem is not yet fully resolved. 


I. Introduction 

A pointing error model is used in the antenna calibration 
process to compensate for systematic error sources. Data from 
spacecraft (s/c) or radio star observations are used to deter- 
mine the parameters in the model. The model parameters are 
then used to generate a systematic error correction table for 
accurately pointing the antenna. The pointing error modeling 
approach used was originally devised by optical astronomers 
and subsequently adapted by radio astronomers for RF anten- 
nas. The model is based on logical, expected physical behavior 
of the antenna and has been successfully applied to many 
radio astronomy facilities: the Bonn 100-m Az-El antenna 
[1] and the Haystack 37-m Az-El antenna [2] . The complete 
pointing error model for an antenna is a sum of individual 
error functions. Table 1 shows the individual error sources and 
the elevation and cross-elevation (or, depending on the antenna 
mount, declination and cross-declination) error functions used 
to develop a systematic error correction table ([1], [2] and [3] 
give a more in-depth description of the parameters). 


When modeling a system, one may select the model purpose 
to fall into one of three main categories: explanation, variable 
selection, or prediction. If the model is explanatory, then it 
represents the y in terms of the x’s and explains how the x’s 
affect the y. Variable selection techniques should be used 
when the goal is to determine which variables from a group of 
variables are important in determining the optimal model for 
y. This selection of variables could provide the best fit, the 
simplest form of the model, or both. Prediction, or forecast- 
ing, techniques estimate the output,^, at previously unobserved 
values of inputs, x. 

The current pointing error model used in the DSN is of the 
explanatory type, and the parameters P are determined by per- 
forming a linear least squares fit on offset data collected from 
s/c or radio star observations. Currently, the regressor variables 
are not truly independent and, rather, display redundant infor- 
mation-a condition known as multicollinearity [4] . Multi- 
collinearity results in limitations on the ability of an ordinary 
linear least squares fit to provide stable and accurate variables. 
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It is therefore desirable to study alternate techniques for 
parameter estimation. Ridge regression is a biased estimation 
technique for combating the multicollinearity problem. This 
article reviews the use of the ridge regression technique and 
demonstrates the advantages and limitations of its uses for 
systematic error correction development. 

II. Review of Regression Analysis 

Suppose that, in an experiment, values of the dependent 
variable y are observed, each corresponding to a particular 
value of an independent variable x. A straight line representa- 
tion of the y = ,y(;c) data would have the form 

y = 0 O +0j* + e (1) 

where e is the model error. Equation (1) is a simple linear 
regression model since it contains a single regressor variable, 
jc, and is linear in x. 

The above linear regression of y upon a single variable x 
can be extended to the multiple linear regression model 

y t = 0o + ti*u + p 2 x 2 t + • • - + #**, + e ,- w 

where i = 1,2,. . . ,n(n>k+ 1), e. is a conceptual random 
model error assumed to be uncorrelated for each observation 
(having a zero mean and a constant variance are the 

independent variables (or regressors), y f are the dependent 
variables (or response variables) and are the true responses, 
and (3 k are the unknown regression parameters. One equation 
can be written for each observation, and the error term e 
allows the model to be an equality. In matrix terms, Eq. (2) 
becomes 

y = 0X + e (3) 

Since the regression terms j3 fc are unknown, let the least 
squares estimator for these coefficients be b k . These estimators 
should satisfy the following equation: 

y. = + b.x.. + . . . + b.x.. (4) 

^ i 0 1 il k ik v ' 

where y. are the model’s estimated (or fitted) value to y. of 
Eq. (2). Since Eq. (4) contains only known terms, it does not 
contain the conceptual terms e r 

If the initial model was accurate, then the difference be- 
tween y i and y. should be small. The difference or residual, 
r f ., between the actual values and the fitted values is 


r. = v. - y. 

i •'t * i 

The method of least squares chooses b ik values so that 

n 

E-f 

1=1 

is minimized. The estimates satisfy the following matrix equa- 
tion [4] , [5] : 

b = (X'X)" 1 Xy (6) 

where X ' is the transpose of X. When the regressor variables 
are centered (made dimensionless relative to a mean value), 
X'X is then in correlation form and will be written as X*'X*. 

III. Multicollinearity 

Multicollinearity exists when the regressor variables are 
empirically correlated, affecting the computation of b, which 
involves the X'X matrix. When this situation exists, no conclu- 
sions can be drawn as to the individual roles of the variables. 
If multicollinearity is “severe,” then the coefficients may 
(1) be the wrong size (too large in magnitude); (2) have the 
wrong sign; or (3) be unstable due to ill-conditioned matrix 
computations (i.e., small changes in then’s orx’s lead to large 
changes in the coefficients). Multicollinearity will also inhibit 
the ability to predict. 

Diagnostics can be performed to evaluate the extent of the 
multicollinearity problem. Large values in the correlation 
matrix are one indication of multicollinearity, but this obser- 
vation only shows pairwise correlations, not correlations that 
exist between more than two variables. Variance Inflation 
Factors (VIFs) are another means of identifying multicollin- 
earity. VIFs are the diagonal elements of the inverse of the 
correlation matrix and represent the inflation that each regres- 
sion coefficient experiences above the ideal (identity matrix). 
VIFs are considerably more useful for multicollinearity detec- 
tion than simple correlation values because they give a direct 
measure of multicollinearity and tell the user which coeffi- 
cients are adversely affected and to what extent. As a rule of 
thumb, VIFs greater than 10 indicate that a severe multicol- 
linearity problem exists. Table 2 gives a sample analysis of a 
set of conical scanning (conscan) offset data (collected during 
a Voyager 1 track on the 105th day of 1987) that exhibits a 
multicollinearity problem. Correlation values of zero mean no 
correlation and ±1.0 means full correlation. The VIF data 
from Table 2 indicates a severe multicollinearity problem. 
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IV. Ridge Regression 

Ordinary least squares methods give unbiased estimates and 
have the minimum variance of all linear unbiased estimators. 
However, there is no upper bound on what the variance could 
be, and the presence of multicollinearity could produce large 
variances. Ridge regression is a biased estimation technique 
used to attain a substantial reduction in variance with an in- 
crease in the stability of the coefficients. If the correlation 
matrix is reduced, then the variance 

var(b) - a 2 (X'X) -1 (7) 

is improved and the stability of the coefficients is increased. 
Ridge regression uses this idea. 

Variables x andy in Eq. (2) must first be standardized (cen- 
tered), making them dimensionless relative to an average value 



where i is the number of points (/ = 1, 2, . . . , w) and; is the 
number of parameters (j = 1 , 2, . . . , k). The new standardized 
model becomes 

y* = XV + c (10) 

and the solution for the least squares estimate b* s is 

b* LS = (X*'XV XV (11) 

where X*'X* is the correlation matrix, as stated previously. 

The ideal correlation matrix is the identity matrix, I. If 
multicollinearity exists, high correlation values exist so the 
diagonal elements do not dominate and there are large off- 
diagonal values. To make the correlation matrix values ap- 
proach the identity matrix, the ridge estimator is introduced: 

b* = (X*'X* + kI)- 1 X*y (12) 

where I is the identity matrix and k is a value greater than or 
equal to zero and is chosen by the user. The term k\ adds a 


positive constant to the diagonal elements of the correlation 
matrix in order to make the diagonal elements dominate. 
Accordingly, the inverse (X*'X* + fcl)” 1 will have smaller ele- 
ments, alleviating past difficulties created by having large 
elements on the diagonals of the inverse, like large variances. 
The term k is often referred to as a “shrinkage parameter” 
since it “shrinks” the effects of the off-diagonal elements. The 
ridge estimator b* equals the least squares estimator b* s 
when k = 0. It can also be easily converted back to (dimen- 
sioned) by a simple transformation. 

Ridge regression is called a biased estimation technique 
since the ridge estimators & are biased. Proper selection of 
the shrinkage parameter minimizes the negative effect of large 
bias while maintaining a ridge estimator variance that is signif- 
icantly less than the least squares estimator. As the shrinkage 
parameter increases, the bias of the ridge estimator increases 
and its variance decreases. 

A subjective method exists for choosing the shrinkage 
parameter: the ridge trace. Many different values of k are used 
to compute b^ (k), and then each A (k) is plotted versus k . 
The more unstable the variable is, the faster it drops off and 
stabilizes. Gradual changes of the variables over k denote sta- 
bility. The shrinkage parameter k is chosen so that the esti- 
mates are stable. As a rule, the smallest value of k where sta- 
bility of the coefficients first appears is selected [4] , [6] . 


V. Two Case Studies 

Two applications of the ridge regression technique on the 
systematic error correction model were done using Voyager 1 
conical scanning (conscan) offset data. The results were com- 
pared to fits obtained using an ordinary linear least squares 
method. The selected parameters for the linear least squares fit 
were (refer to Table 1)/^ and/*^. The 

parameters selected for the ridge regression cases werePg,/^, 
i> 13> and ^ 16 * Parameters P x and P n represent constant 
cross-elevation and elevation offsets, respectively. In the ridge 
regression process, these two terms were created by determin- 
ing the cross-elevation and elevation offset biases. 

The first data set uses conscan offset data collected on the 
105th day of 1987. As demonstrated in Table 2, this data 
exhibits a high degree of multicollinearity and would probably 
benefit from the use of ridge regression. Parameters deter- 
mined using the linear least squares method are listed in col- 
umn 1 of Table 3. These parameters exhibit the characteris- 
tics associated with multicollinearity, one of them consisting 
of coefficients that are too large in magnitude (they are too 
large to be realistic or practical). Shrinkage parameters were 
selected in 0.005 increments and ranged from 0 to 0.10. Fig- 
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ure 1 shows the use of the ridge trace for the “best” subjective 
selection of ridge estimators. Stability seems to be reached at 
approximately k = 0.02. The parameters for this shrinkage 
parameter are listed in column 2 of Table 3. The coefficients 
have diminished in value, approaching a more realistic repre- 
sentation. Figure 2 compares the residual fit errors obtained in 
both the linear least squares method and ridge regression. The 
residual errors are defined in Eq. (5) as the difference between 
the actual and the fitted pointing offsets. The signatures for 
both sets of residual errors are similar, indicating incomplete- 
ness in the model itself, but the average residual offset for the 
ridge regression case is nearly zero, and the standard deviations 
are similar (approximately 0.9 mdeg). 

The above example demonstrated how ridge regression can 
be used to obtain more realistic parameters and fewer overall 
fitting errors (average error approaching zero). Multicollinear- 
ity also causes the parameters to be unstable. Conscan offset 
data collected from Voyager 1 tracks on the 105th and 106th 
days should yield similar results. No changes were made to any 
part of the antenna mechanical subsystem between these two 
consecutive tracking sessions (for example, the same a priori 
systematic error correction table and autocollimators were 
employed in both cases), yet the parameters determined using 


the linear least squares fitting method (listed in columns 1 and 

3 of Table 3) seem to indicate otherwise. The parameters not 
only differ in sign, but also differ radically in magnitude. 
Parameters determined using ridge regression (columns 2 and 

4 of Table 3) are in closer agreement in both magnitude (off 
by a small factor-3 or 4-rather than 10 or 20) and sign, and 
also yield similar overall fits (same average and standard 
deviation). 

VI. Conclusion 

The ridge regression technique was shown to be useful in 
minimizing the effects of multicollinearity. For the two exam- 
ples given, it generated stable coefficients for similar sets of 
data, provided coefficients that were more realistic in magni- 
tude, and gave an overall fit with average residual errors near 
zero. Although these are good results in terms of coefficient 
characteristics, the overall fitting results using ridge regression 
were no better than the linear least squares results since the 
signatures resulting from the two methods exhibited analogous 
trends. A technique such as variable selection or prediction 
may be needed in order to get a more optimal model and a 
better parameter selection procedure. In any case, the prob- 
lem of multicollinearity must still be addressed and resolved. 


References 

[1] P. Stumpff, “Astronomical Pointing Theory for Radio Telescopes,” Klein-Heibacher 
Berichte , vol. 15, Fornmoldeteshnischon Zentralamt, Darmstadt, West Germany, 
pp. 431-437, 1972. 

[2] M. L. Meeks, H. A. Ball, and A. B. Hull, “The Pointing Calibration of the Haystack 
Antenna,” IEEE Transactions on Antennas and Propagation , vol. AP-16, no. 6, 
pp. 746-751 , November 1968. 

[3] C. N. Guiar, F. L. Lansing, and R. Riggs, “Antenna Pointing Systematic Error Model 
Derivations,” TDA Progress Report 42-88 , vol. October-December 1986, Jet Propul- 
sion Laboratory, Pasadena, California, pp. 36-46, February 15, 1987. 

[4] R. B. Myers, Classical and Modern Regression with Applications , Boston: Duxbury, 
1986. 

[5] E. Kreysig, Advanced Engineering Mathematics , New York: John Wiley and Sons, 
1983. 

[6] A. J. Bush, “Ridge: A Program to Perform Ridge Regression Analysis,” Behavior 
Research Methods and Instrumentation, vol. 12, no. 1, pp. 73-74, 1980. 


23 


Table 1. Systematic pointing error sources and model terms 


Error source 

Model function 



Cross-elevation error 

Elevation error 

Az collimation 

p i 

- 

Az encoder fixed offset 

P 2 cos (el) 

- 

Az/el skew 

P 3 sin (el) 

- 

Az axis tilt 

/> 4 sin (el) cos (az) 

-P 4 sin (az) 

Az axis tilt 

P s sin (el) sin (az) 

P s cos (az) 

El encoder fixed offset 

- 

P 1 

Gravitational flexure 

- 

P & cos (el) 

Residual refraction 


P 9 cot (el) 

Az encoder scale error 

? lo (az/360) cos (el) 

_ 


Cross-declination error 

Declination error 

HA/dec axis skew 

P j 2 sin (dec) 

_ 

HA axis tilt 

P l2 sin (HA) sin (dec) 

P y2 cos (HA) 

HA axis tilt 

-P 13 cos (HA) sin (dec) 

P l3 sin (HA) 

HA feed offset 

" P 14 

- 

Gravitational flexure 

P 1S cos (p) cos (el) 

-P IS sin ip) cos (el) 

Declination feed offset 

- 

P X 6 

Gravitational flexure 

P in sin {p) cos (el) 


Gravitational flexure 

- 

-/ > cos ip) cos (el) 

Gravitational flexure 

-P ig sin (el) 

- 

Gravitational flexure 

- 

P 2o sin ( el ) 

HA encoder bias 

P^ cos (dec) 

- 

Note: (1) Uppercase P refers to parameter value; lowercase p refers to paralectic angle. 

(2) Az = azimuth angle; el 

= elevation angle; dec = declination angle; HA = hour angle. 



Table 2. Sample correlation matrix and variance inflation factors (for Voyager 1 conscan 

offset data from 105th day of 1987) 


Correlation matrix VIF 


Variable 

8 

12 

13 

14 

16 


8 

1.0000 

0.8653 

-0.9911 

-0.9954 

0.9937 

747.6 

12 

0.8653 

1.0000 

-0.8624 

-0.8690 

0.8994 

103.9 

13 

-0.9911 

-0.8624 

1.0000 

0.9981 

-0.9959 

2783.1 

14 

-0.9954 

-0.8690 

0.9981 

1.0000 

-0.9966 

929.0 

16 

0.9937 

0.8994 

-0.9959 

-0.9966 

1.0000 

4377.6 


Table 3. Model parameters for two Voyager 1 conscan offset data sets (105th and 106th days of 
1987) using linear least squares and ridge regression (units are in millidegrees) 




Day 105 


Day 106 

Parameter 

(P) 

Linear 

least 

squares 

(1) 

Ridge 

regression 

(2) 

Linear 

least 

squares 

(3) 

Ridge 

regression 

(4) 

1 

-451.93 

24.08* 

-0.55 

18.68* 

7 

-141.18 

29.58* 

17.86 

17.13* 

8 

-271.78 

-13.45 

-27.22 

-29.11 

12 

-114.05 

4.28 

1.40 

1.34 

13 

240.03 

-4.10 

-0.65 

-17.24 

14 

-557.56 

4.85 

6.19 

15.26 

16 

103.29 

0.25 

0.49 

0.89 

*Ridge regression parameters and /> 7 are created by determining the cross-elevation and elevation 
biases. 







